10 research outputs found

    OMARS: The Framework of an Online Multi-Dimensional Association Rules Mining System

    Get PDF
    Recently, the integration of data warehouses and data mining has been recognized as the primary platform for facilitating knowledge discovery. Effective data mining from data warehouses, however, needs exploratory data analysis. The users often need to investigate the warehousing data from various perspectives and analyze them at different levels of abstraction. To this end, comprehensive information processing and data analysis have to be systematically constructed surrounding data warehouses, and an on-line mining environment should be provided. In this paper, we propose a system framework to facilitate on-line association rules mining, called OMARS, which is based on the idea of integrating OLAP service and our proposed OLAM cubes and auxiliary cubes. According to the concept of OLAM cubes, we define the OLAM lattice framework that exploit arbitrary hierarchies of dimensions to model all possible OLAM data cubes

    Effective Ranking and Recommendation on Web Page Retrieval by Integrating Association Mining and PageRank

    No full text
    Nowadays, the well-known search engines, such as Google, Yahoo, MSN, etc, have provided the users with good search results based on special search strategies. However there still exist some problems unsolved for traditional search engines, including: 1) the gap between user’s intention and searched results is not easy to narrow down under the global search space, and 2) user’s interested pages hidden in the local website are not associated with the search results. To deal with such problems, in this paper, we propose a novel approach for personalized page ranking and recommendation by integrating association mining and PageRank so as to meet user’s search goals. Moreover, by mining the users ’ browsing behaviors, we can successfully bridge the gap between global search results and local preferences. The effectiveness of our proposed approach was verified through experimental evaluations

    Web image annotation by fusing visual features and textual information

    No full text
    In this paper, we propose a novel web image annotation method, namely FMD (Fused annotation by Mixed model graph and Decision tree), which combines visual features and textual information to conceptualize the web images. The FMD approach consists of three main processes: 1) construct the visual-based model, namely ModelMMG, 2) construct the textual-based model, namely ModelDT, and 3) fuse ModelMMG and ModelDT as ModelFMD for annotating the images. The purpose of visual-based annotation model is to objectify the image not only by the global content of the image but also by its local content of composing objects. The textual-based annotation model is to handle the problems of user-specified dependency of keywords and the complex computation due to high dimensionalities in text features. The experimental results reveal that the proposed FMD method is very effective for web image annotation in terms of accuracy through the integration of two different types of features

    Mining and applications of repeating patterns

    No full text
    Abstract Mining the valuable knowledge from real data has been a hot topic for a long time. Repeating pattern is one of the important knowledge, occurring in many real applications such as musical data and medical data. In this paper, our purposes are to contribute an efficient mining algorithm for repeating patterns and to conduct a real application using the repeating patterns mined. In terms of mining the repeating patterns, although a number of past studies were made on this issue, the performance cannot still earn the users’ satisfactions especially for large data sets. For this issue, in this paper, we propose an efficient algorithm named Fast Mining of Repeating Patterns, which achieves high performance of discovering the repeating patterns by a novel index called Quick-Pattern Index. In terms of applications, a music recommender system named repeating-pattern-based music recommender system is proposed to deal with problems in music recommendation. Even facing a very sparse rating matrix, the recommendation can still be completed. The experimental results show that our proposed mining algorithm and recommender system outperform the previous works in terms of efficiency and effectiveness, respectively

    Discovery of temporal association rules with hierarchical granular framework

    Get PDF
    Most of the existing studies in temporal data mining consider only lifespan of items to find general temporal association rules. However, an infrequent item for the entire time may be frequent within part of the time. We thus organize time into granules and consider temporal data mining for different levels of granules. Besides, an item may not be ready at the beginning of a store. In this paper, we use the first transaction including an item as the start point for the item. Before the start point, the item may not be brought. A three-phase mining framework with consideration of the item lifespan definition is designed. At last, experiments were made to demonstrate the performance of the proposed framework

    Effective Invasiveness Recognition of Imbalanced Data by Semi-Automated Segmentations of Lung Nodules

    No full text
    Over the past few decades, recognition of early lung cancers was researched for effective treatments. In early lung cancers, the invasiveness is an important factor for expected survival rates. Hence, how to effectively identify the invasiveness by computed tomography (CT) images became a hot topic in the field of biomedical science. Although a number of previous works were shown to be effective on this topic, there remain some problems unsettled still. First, it needs a large amount of marked data for a better prediction, but the manual cost is high. Second, the accuracy is always limited in imbalance data. To alleviate these problems, in this paper, we propose an effective CT invasiveness recognizer by semi-automated segmentation. In terms of semi-automated segmentation, it is easy for doctors to mark the nodules. Just based on one clicked pixel, a nodule object in a CT image can be marked by fusing two proposed segmentation methods, including thresholding-based morphology and deep learning-based mask region-based convolutional neural network (Mask-RCNN). For thresholding-based morphology, an initial segmentation is derived by adaptive pixel connections. Then, a mathematical morphology is performed to achieve a better segmentation. For deep learning-based mask-RCNN, the anchor is fixed by the clicked pixel to reduce the computational complexity. To incorporate advantages of both, the segmentation is switched between these two sub-methods. After segmenting the nodules, a boosting ensemble classification model with feature selection is executed to identify the invasiveness by equalized down-sampling. The extensive experimental results on a real dataset reveal that the proposed segmentation method performs better than the traditional segmentation ones, which can reach an average dice improvement of 392.3%. Additionally, the proposed ensemble classification model infers better performances than the compared method, which can reach an area under curve (AUC) improvement of 5.3% and a specificity improvement of 14.3%. Moreover, in comparison with the models with imbalance data, the improvements of AUC and specificity can reach 10.4% and 33.3%, respectively
    corecore